Homework 4 - Michał Gromadzki

Importing libraries

Loading dataset

EDA and Preprocessing

Checking for nulls.

No nulls.

Encoding categorical features.

Checking correlation.

A strong correlation is observed only with smoking

Models

LinearRegression

Forest

XGBoost

Homework

Creating explainers

1. Calculating Partial Dependence Profiles

LinearRegression

As we can see from PD profiles features with significant impact on the prediction are age, BMI and smoker. While smoker has the steepest which suggestes that this feature has the biggest impact on the prediction.

RandomForest

Main conclusion are the same as in the example above. Additionally we can see that predicted charges increase dramatically at BMI=30. There is also more irregularities in the plots, which couldn't exist in the first example because of different nature of the used model. Moreover it is easier to differentiate between continuous and discrete features.

XGB

The differences between second and third model are much smaller than between first and the second one. It is still possible to see the jump in prediction at BMI=30. There is also even more irregularities in the plots then in the second example. We can predicted that this effect takes place becouse the third model is more complex than the second one. Additionally for the first time we can see different predictions based on sex.

2. Calculating the Accumulated Local Dependence

LinearRegression

RandomForest

XGB

3. Comments

Partial dependence profiles seem to be simillar to CP profiles.